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Abstract. Clock synchronization is a very fundamental task in distributed system. 
The vast majority of distributed tasks require some sort of synchronization and clock 
• synchronization is a very straightforward tool for supplying this. It thus makes sense 

I to require an underlying clock synchronization mechanism to be highly fault-tolerant. 

. . A self-stabilizing algorithm seeks to attain synchronization once lost; a Byzantine al- 

^ ' gorithm assumes synchronization is never lost and focuses on containing the influence 

of the permanent presence of faulty nodes. There are efficient self-stabilizing solutions 
for clock synchronization as well as eflicient solutions that are resilient to Byzantine 
faults. In contrast, to the best of our knowledge there is no practical solution that is 
^ , self-stabilizing while tolerating the permanent presence of Byzantine nodes. Designing 

algorithms that self-stabilize while at the same time tolerate permanent Byzantine 
failures present a special challenge due to the "ambition" of malicious nodes to ham- 
per stabilization if the system tries to recover from a corrupted state. We present 
the flrst linear-time self-stabilizing Byzantine clock synchronization algorithm. Our 
[ deterministic clock synchronization algorithm is based on the observation that all 

. clock synchronization algorithms require events for exchanging clock values and re- 

' synchronizing the clocks to within safe bounds. These events usually need to happen 

^ , synchronously at the different nodes. In classic Byzantine algorithms this is fulflUed 

or aided by having the clocks initially close to each other and thus the actual clock 
values can be used for synchronizing the events. This implies that clock values cannot 
, differ arbitrarily, which necessarily renders these solutions to be non-stabilizing. Our 

5_( ' scheme suggests using an underlying distributed pulse synchronization module that 

I is uncorrelated to the clock values. The synchronized pulses are used as the events 

for re-synchronizing the clock values. The algorithm is very efficient and attains and 
maintains high precision of the clocks. 
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This is an updated version. The original paper appeared in OPODIS'03. The main 
difference is the replacement of the pulse synchronization module. 

1 Introduction 

On-going faults whose nature is not predictable or that express complex behavior 
are most suitably addressed in the Byzantine fault model. It is the preferred fault 
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model in order to seal off unexpected behavior within limitations on the number of 
concurrent faults. Most distributed tasks require the number of concurrent Byzan- 
tine faults, /, to abide by the ratio of 3/ < n, where n is the network size. See ^1] 
for impossibility results on several consensus related problems such as clock syn- 
chronization. Additionally, it makes sense to require systems to resume operation 
after a major failure without the need for an outside intervention and/or a restart of 
the system from scratch. E.g. systems may occasionally experience short periods in 
which more than a third of the nodes are faulty or messages sent by all nodes may 
be lost for some time due to a network failure. 

Such transient violations of the basic fault assumptions may leave the system 
in an arbitrary state from which the protocol is required to resume in realizing its 
task. Typically, Byzantine algorithms do not ensure convergence in such cases, as 
strong assumptions are usually made on the initial state and thus merely focus on 
preventing Byzantine faults from notably shifting the system state away from the 
goal. A self-stabilizing algorithm bypasses this limitation by being designed to con- 
verge within finite time to a desired state from any initial state. Thus, even if the 
system loses its consistency due to a transient violation of the basic fault assump- 
tions (e.g. more than a third of the nodes being faulty, network disconnected, etc.), 
then once the system becomes coherent again the protocol will successfully realize 
the task, irrespective of the resumed state of the system. In trying to combine both 
fault models, Byzantine failures present a special challenge for designing stabilizing 
algorithms due to the "ambition" of malicious nodes to incessantly hamper stabiliza- 
tion, as might be indicated by the remarkably few algorithms resilient to both fault 
models. For a short survey of self-stabilization see j3|, for an extensive study see 

The current paper addresses the problem of synchronizing clocks in a distributed 
system. There are several efficient algorithms for self-stabilizing clock synchroniza- 
tion withstanding crash faults (see |13|18|in] . for other variants of the problem 
see |2|15| ). There are many efficient classic Byzantine clock synchronization algo- 
rithms, for a performance evaluation of clock synchronization algorithms see 
However, strong assumptions on the initial state of the nodes are typically made, 
usually assuming all clocks are initially synchronized ([l'8'2.1j) and thus these are 
not self-stabilizing solutions. On the other hand, self-stabilizing clock synchroniza- 
tion algorithms allow initialization with arbitrary clock values, but typically have a 
cost in the convergence times or in the severity of the faults contained. Evidently, 
there are very few self-stabilizing solutions facing Byzantine faults (^21); all with 
unpractical convergence times. The protocols in are to the best of our knowl- 
edge the first self-stabilizing protocols that are tolerant to Byzantine faults. Note 
that self-stabilizing clock synchronization has an inherent difficulty in estimating 
real-time without an external time reference due to the fact that non-faulty nodes 
may initialize with arbitrary clock values. Thus, self-stabilizing clock synchronization 
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aims at reaching a legal state from which clocks proceed synchronously at the rate 
of real-time (assuming that nodes have access to physical timers which rate is close 
to real-time) and not necessarily at estimating real-time. Many applications utilizing 
the synchronization of clocks do not really require the exact real-time notion (see 
In such applications, agreeing on a common clock reading is sufficient as long 
as the clocks progress within a linear envelope of any real-time interval. 

We present a Byzantine self-stabilizing clock synchronization protocol with the 
following property: should the system initialize or recover from any transient faults 
with arbitrary clock values then the clocks of the correct nodes proceed synchronously 
at real-time rate. Should the clocks of the correct nodes hold values that are close 
to real-time, then the correct clocks proceed synchronously with high real-time ac- 
curacy. Thus, the protocol we present significantly improves upon existing Byzan- 
tine self-stabilizing clock synchronization algorithms by reducing the time complex- 
ity from expected exponential ([Ej) to deterministic 0{f). Our protocol improves 
upon existing Byzantine non-stabilizing clock synchronization algorithms by provid- 
ing self-stabilization while performing with similar complexity. The self-stabilization 
and comparably low complexity is achieved by executing on top of a determinis- 
tic Byzantine self-stabilizing algorithm for pulse synchronization [5]. The interval 
between the synchronized pulses is long enough to allow initialization and termi- 
nation of a Byzantine consensus procedure on the clock values, thus attaining and 
maintaining a common clock reading. 

Having access to an outside source of real-time is useful. In such case our approach 
maintains a consistent system state when the outside source fails. 

A special challenge in self-stabilizing clock synchronization is the clock wrap 
around. In non-stabilizing algorithms having a large enough integer eliminates the 
problem for any practical concern. In self-stabilizing schemes a transient failure can 
cause clocks to hold arbitrary large values, surfacing the issue of clock bounds. Our 
clock synchronization scheme handles clock wrap around difficulties. 

The system may be in an arbitrary state in which the communication network 
may behave arbitrarily and in which there may be an unbounded number of con- 
current Byzantine faulty nodes. The algorithm will eventually converge once the 
communication network resumes delivering messages within bounded, some d, time 
units, and the fraction of Byzantine nodes, /, obeys n > 3/ -|- 1, for a network of 
size n. The attained clock precision and accuracy is Ud real-time units, though we 
present an additional scheme that can attain clock precision and accuracy of 3d. The 
convergence time is 0{f') communication rounds, where f < f is the actual number 
of concurrent faults. Our protocol has the additional advantage of a minimal time 
and message overhead during steady-state after the clocks have synchronized. 

An additional advantage of our algorithm is the use of a Byzantine Consensus 
protocol that works in a message driven manner. The basic protocol follows closely 
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the early stopping Byzantine Agreement protocol of Toueg, Perry and Srikanth |2nj . 
The main difference is that the protocol rounds progress at the rate of the actual 
time of information exchange among the correctly operating nodes. This, typically, 
is much faster than progression with rounds whose time lengths are functions of the 
upper bound on message delivery time between correct nodes. 

2 Model and Problem Definition 

The environment is a bounded-delay network model of n nodes that communicate 
by exchanging messages. We assume that the message passing allows for an authen- 
ticated identity of the senders. The communication network does not guarantee any 
order on messages among different nodes. Individual nodes have no access to a cen- 
tral clock and there is no external pulse system. The hardware clock rate (referred to 
as the physical timers) of correct nodes has a bounded drift, /J, from real-time rate. 
Consequent to transient failures there can be an arbitrary number of concurrent 
Byzantine faulty nodes, the turnover rate between faulty and non-faulty behavior of 
the nodes can be arbitrary and the communication network may behave arbitrarily. 
Eventually the system behaves coherently again but in an arbitrary state. 

Definition 1. A node is non- faulty at times that it complies with the following: 

1. Obeys a global constant < p « 1 (typically p ~ 10^^), such that for every 
real-time interval [u, v] : 

(1 — p){v — u) < 'physical timer\v) — 'physical timer'{u) < {I + p){v — u). 

2. Operates according to the instructed protocol. 

3. Processes any message of the instructed protocol within vr real-time units of arrival 
time. 

A node is considered faulty if it violates any of the above conditions. We allow 
for Byzantine behavior of the faulty nodes. A faulty node may recover from its faulty 
behavior once it resumes obeying the conditions of a non-faulty node. For consistency 
reasons, the "correction" is not immediate but rather takes a certain amount of time 
during which the non-faulty node is still not counted as a correct node, although 
it supposedly behaves "correctly"^. We later specify the time-length of continuous 
non- faulty behavior required of a recovering node to be considered correct. 

Definition 2. The communication network is non-faulty at periods that it complies 
with the following: 

^ For example, a node may recover with arbitrary variables, which may violate the validity condition 
if considered correct prematurely. 
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1. Any message sent by any non-faulty node arrives at every non-faulty node within 
6 real-time units; 

2. All messages sent by a non-faulty node and received by a non-faulty node obey 
FOFI order. 

The system is said to be coherent only following some minimal^ amount of time 
of continuous non-faulty behavior of the nodes and the communication network. 

Basic notations: 

We use the following notations though nodes do not need to maintain all of them 
as variables. 

— d = 6 + TT. Thus, when the communication network is non-faulty, d is the upper 
bound on the elapsed real-time from the sending of a message by a non-faulty 
node until it is received and processed by every correct node. 

— Clocki, the clock of node i, is a real value in the range to M — 1. Thus M — 1 
is the maximal value a clock can hold. Its progression rate is a function of node 
Pi's physical timer. The clock is incremented every time unit. Clocki{t) denotes 
the value of the clock of node pi at real-time t. 

— 7 is the target upper bound on the difference of clock readings of any two correct 
clocks at any real-time. Our protocol achieves 7 = 3(i -1- 0(p). 

— Let a, b,g,h £ R'^ be constants that define the linear envelope bound of the 
correct clock progression rate during any real-time interval. 

— 'Fi{ti,t2) is the amount of clock time elapsed on the clock of node pi during a 
real-time interval [ti,t2] within which pi was continuously correct. The value of 

is not affected by any wrap around of clocki during that period. 

— A pulse is an internal event targeted to happen in tight synchrony at all correct 
nodes. A Cycle (with upper-case initial letter) is the "ideal" time interval length 
between two successive pulses that a node invokes, as given by the user. The 
actual cycle length, denoted with lowercase initial, has upper and lower bounds 
as a result of faulty nodes and the physical clock skew, denoted cycie^^^ and 
cycle^i^ respectively. 

— a represents the upper bound on the real-time between the invocation of the 
pulses of different correct nodes {tightness of pulse synchronization). The pulse 
synchronization procedure in ^ achieves a = 3d. 

— puise_ couv represents the convergence time of the underlying pulse synchroniza- 
tion module. The pulse procedure in converges within 6 ■ cycle. 

— agreement _duration represents the maximum real-time required to complete the 
chosen Byzantine consensus procedure used in Section 13.11 We assume 

^ An infinitely small time period in which the nodes and the communication network are non-faulty 
has no practical meaning. The required minimal value in our context will be specified later. 
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a < a + agreement _duration < cycle < Cycle + agreenient_duration. For sim- 
plicity of our arguments we also assume that M > agreement _ duration but this 
is not a necessary assumption. 

Non-faulty nodes do not initialize with arbitrary values of n, / and Cycle as these 
are fixed constants. It is required that Cycle is chosen s.t. cyde^jj, is large enough 
to allow our protocol to terminate in between pulses. 

A recovering node should be considered correct only once it has been continuously 
non-faulty for enough time to enable it to go through a complete "synchronization 
process". This is the time it takes, from any state, to complete a pulses that is in 
synchrony with all other correct nodes and synchronize with the consensus variables. 

Definition 3. The communication network is correct following A^t real-time of 
continuous non-faulty behavior.^ 

Definition 4. A node is correct following ^node real-time of continuous non-faulty 
behavior during a period that the communication network is correct.'^ 

Definition 5. The system is said to he coherent at times that it complies with the 
following: 

1. (Quorum) At least n — f of the nodes are correct, where n > 3/ + 1; 

2. (Network Correctness) The communication network is correct. 

Hence, if the system is not coherent then there can be an unbounded number of 
concurrent faulty nodes; the turnover rate between faulty and non-faulty nodes can 
be arbitrarily large and the communication network may behave arbitrarily. When 
the system is coherent, then the communication network and a large enough fraction 
of the nodes (n — /) have been non-faulty for a sufficiently long time period for 
the pre-conditions for convergence of the protocol to hold. The assumption in this 
paper, as underlies any other self-stabilizing algorithm, is that eventually the system 
becomes coherent. 

Basic definitions: 

- The clock_state of the system at real-time t is given by: 

clock _state{t) = {clocko{t), clockn-i{t)) . 

- The systems is in a synchronized clock_state at real-time t ii^ correct pi,pj, 

{\clocki{t) - clock j{t)\ < 7) V {\clocki{t) - clock j{t)\ > M - ^) 

^ We will use Anet > pulse _coav -\- agreement _ duration + a. 

* We will use A^odR > pulse _conv -\- agreement _(iuration + a. 

^ The second condition is a result of dealing with bounded clock variables. 
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Definition 6. The "Self-stabilizing Byzantine Clock Synchronization Prob- 
lem" 

Convergence: Starting from an arbitrary system state, s, the system reaches a 
synchronized clock_ state after a finite time. 

Closure: If s is a synchronized clock_state of the system at real-time to then 
Mreal time t >tQ, 

1. clock_ state (t) is a synchronized clock_state, 

2. "Linear Envelope": for every correct node, pi, 

a • [t - to] + ft < l^i(to,t) <g-[t-tQ] + h. 

The second Closure condition intends to bound the effective clock progression 
rate in order to defy a trivial solution. 

3 Self-stabilizing Byzantine Clock Synchronization 

A major challenge of self-stabilizing clock synchronization is to ensure clock syn- 
chronization even when nodes may initialize with arbitrary clock values. This, as 
mentioned before, requires handling the wrap around of clock values. The algorithm 
we present employs as a building block an underlying self-stabilizing Byzantine pulse 
synchronization procedure presented in p]. In the pulse synchronization problem 
nodes invoke pulses regularly, ideally every Cycle time units. The goal is for the 
different correct nodes to do so in tight synchrony of each other. To synchronize 
their clocks, nodes execute at every pulse Byzantine consensus on the clock value to 
be associated with the next pulse event^. When pulses are synchronized, then the 
consensus results in synchronized clocks. The basic algorithm uses strong consensus 
to ensure that once correct clocks are synchronized at a certain pulse, and thus enter 
the consensus procedure with identical values, then they terminate with the same 
identical values and keep the progression of clocks continuous and synchronized^. 

3.1 The Basic Clock Synchronization Algorithm 

The basic clock synchronization algorithm is essentially a self-stabilizing version of 
the Byzantine clock synchronization algorithm in jH]. 

We call it PBSS-Clock-Synch (for Pulse-based Byzantine Self- stabilizing Clock 
Synchronization). The agreed clock time to be associated with the next pulse (next 

^ It is assumed that the time between successive pulses is sufficient for a Byzantine consensus 
algorithm to initiate and terminate in between. 

The pulse synchronization building block does not use the value of the clock to determine its 
progress, but rather intervals measured on the physical timer. 
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"time for synchronization" in p]) is denoted by ET (for Expected Time, as in (Hj). 
Synchronization of clocks is targeted to happen every Cycle time units, unless the 
pulse is invoked earlier (or later)^. 



Algorithm PBSS-Clock-Synch 

at '^pulse'^ event /* received the internal pulse event */ 

begin 

1. Clock ■- ET; 

2. Revoke possible other instances of PBSS-Clock-Synch and 

clear all data structures besides ET and Clock; 

3. Wait until a(l + p) time units have elapsed since pulse; 

4. Next_ET := Byz_Consensus((£T + CjcJe) mod M, a); 

5. Clock ■- {Clock + Next_ET ~ {ET + Cycle)) mod M; /* posterior adjust. */ 

6. ET ■- Ncxt_ET; 
end 

Fig. 1. The self-stabilizing Byzantine clock synchronization algorithm 

The internal pulse event is delivered by the pulse synchronization procedure. 
We assume the use of the pulse synchronization presented in p], though any pulse 
synchronization algorithm that delivers synchronized pulses by solving the "Self- 
stabilizing Pulse Synchronization Problem", in the presence of at most / Byzantine 
nodes, where n > 3/ + 1, such as the pulse procedure in 0, can be executed in the 
background. 

The pulse event aborts any possible on-going invocation of PBSS-Clock-Synch 
(and thus any on-going instant of Byz_Consensus) and resets all buffers. The 
synchronization of the pulses ensures that the PBSS-Clock-Synch procedure is 
invoked within a real-time units of its invocation at all other correct nodes. 

Line 1 sets the local clock to the pre-agreed time associated with the current pulse 
event. Line 3 intends to make sure that all correct nodes invoke Byz_Consensus 
only after the pulse has been invoked at all others, without remnants of past in- 
vocations, which are revoked at Line 2. Past remnants may exist only during or 
immediately following periods in which the system is not coherent. 

In Line 4 Byz_Consensus intends to reach consensus on the next value of ET. 
One can use a synchronous consensus algorithm with rounds of size {a + d){l + 2p) 
or asynchronous style consensus in which a node waits to get n — f messages of the 
previous round before moving to the next round. We assume the use of a Byzantine 
consensus procedure tolerating / faults when n > 3/ -|- 1. A correct node joins 



Cycle has the same function as PER in [H]. 
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Byz_Consensus only concomitant to an internal pulse event, as instructed by 
the PBSS-Clock-Synch. This contains the possibility of faulty nodes to initiate 
consensus at arbitrary times. 

Line 5 is a posterior clock adjustment. It increments the clock value with the 
difference between the agreed time associated with the next pulse and the node's 
pre-consensus estimate for the time associated with the next pulse (the value which it 
entered the consensus with). This is equivalent to incrementing the value of ET that 
the node was supposed to hold at the pulse according to the agreed Next_ET with 
the elapsed time from the pulse and until the termination of Byz_Consensus. This 
intends to expedite the time to reach synchronization of the clocks. In case that the 
clock_state before Line 5 was not a synchronized clock_state then a synchronized 
clock_state is attained following termination of Byz_Consensus at all correct 
nodes, rather than at the next pulse event. Note that in the case that all correct 
nodes hold the same ET value at the pulse, then the posterior clock adjustment 
adds a zero increment to the clock value. 

Note that when the system is not yet coherent, following a chaotic state, pulses 
may arrive to different nodes at arbitrary times, and the ET values and the clocks 
of different nodes may differ arbitrarily. At that time not all correct nodes will join 
Byz_Consensus and no consistent resultant value can be guaranteed. Once the 
pulses synchronize (guaranteed by the pulse synchronization procedure to happen 
within a single cycle) all correct nodes will join the same instant of Byz_Consensus 
and will agree on the clock value associated with the next pulse. From that time 
on, as long as the system stays coherent the clock_state remains a synchronized 
clock_state. 

The use of Byzantine consensus tackles the clock wrap-around in a trivial manner 
at all correct nodes. 

Note that instead of simply setting the clock value to ET we could use some 
Clock- Adjustment procedure (cf. jB]), which receives a parameter indicating the tar- 
get value of the clock. The procedure runs in the background, it speeds up or slows 
down the clock rate to smoothly reach the adjusted value within a specified period 
of time. This procedure should also handle the clock wrap around. 

Theorem 1. PBSS-Clock-Synch solves the "Self-stabilizing Byzantine Clock Syn- 
chronization Problem". 

Proof. Convergence: Let the system be coherent but in an arbitrary state s, with 
the nodes holding arbitrary clock values. Consider the first correct node that com- 
pleted line 3 of the PBSS-Clock-Synch algorithm. Since the system is coherent, all 
correct nodes invoked the preceding pulse within a of each other. At the last pulse all 
remnants of previously invoked instances of Byz_Consensus were fiushed by all the 
correct nodes. A correct node does not initiate or join procedure Byz_Consensus 
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before waiting a{l + p) time units subsequent to the pulse, hence not before all 
correct nodes have invoked a pulse and subsequently flushed their buffers. Thus all 
correct nodes will eventually join Byz_Consensus, thus Byz_Consensus will 
initiate and terminate successfully. 

At termination of the first instance of Byz_Consensus following the synchro- 
nization of the pulses, all correct nodes agree on the clock value to be associated 
with the next pulse invocation. Subsequently, all correct nodes adjust their clocks, 
post factum, according to the agreed ET. Note that this posterior adjustment of the 
clocks does not affect the time span imtil the invocation of the next pulse but rather 
updates the clocks concomitantly to and in accordance with the newly agreed ET. 
This has an effect only if the correct nodes joined Byz_Consensus with differing 
values. Hence if all correct nodes join Byz_Consensus with the same ET then the 
adjustment equals zero. Since all correct pulses arrived within a real-time units of 
each other, after the posterior clock adjustment of the last correct node, all correct 
clocks values are within 

7i = £7(1 + p) + {a + agreement _duration) ■ 2p 

of each other. The 2p is the maximal drift rate between any two correct clocks 
(whereas p is their drift with respect to real-time). Observe that 71 < 7 and therefore 
the state of the system is a synchronized clock_state. This concludes the Convergence 
condition. 

□ 

Closure: Recall that system coherence is defined as a continuous non-faulty behavior 
of the communication network and a large enough fraction of the nodes for at least 
some minimal period of time. The proof of the Closure condition assumes the correct 
nodes have synchronized their ET values, thus setting this minimal time to be at 
least cyde„-^gj^ + agreement_duration time, ensuring synchronization of the variables. 

Let the system be in a synchronized clock_state and w.l.o.g. assume all correct 
nodes hold synchronized and identical ET values. Observe that although the correct 
nodes have SA'nchronized their ET values this does not necessarily imply all correct 
nodes hold the same ET value at every point in time. At a brief time subsequent 
to the termination of Byz_Consensus, only a part of the correct nodes may have 
set the ET to the new agreed value while the rest of the correct nodes currently 
holding the old ET value will set ET to the new value in a brief time. We first prove 
the first Closure condition {precision). In this case, each correct node adjusts its 
clock immediately subsequent to the pulse, but the posterior clock adjustment has 
no effect since the consensus value equals the value it joined Byz_Consensus with. 
To simplify the discussion assume for now that no wrap around of any correct clock 
takes place during the time that the pulse arrives at the first correct node and until 
the pulse is invoked at the last correct node. Immediately after the pulse is invoked 
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at the last correct node and its subsequent clock adjustment, all correct clocks are 
within 7o = + p) of each other. 

From that point on, clocks of correct nodes drift apart at a rate of 2p of each other. 
As long as no wrap around of the clocks takes place and no pulse arrives at any correct 
node, the clocks are at most 70 + ■ 2p apart, where AT is the real-time elapsed 
since the invocation of the pulse at the first correct node. To estimate the maximal 
clock difference, 7, at any time, we will consider the following complementary cases: 

PI) Prior to the next pulse event at the first correct node. 

P2) When a pulse arrives at some correct node. 

P3) Immediately after the last node invokes its next pulse event. 

Note that in this case we do not need to consider the posterior adjustment of the 
clocks at Line 5. 

Case PI cannot last more than AT = cycle^^^g^^, since by the end of that time 
interval all correct nodes will have invoked the pulse, reducing to case P2 or P3. The 
discussion above implies 7 = 70 + cvc^e^uax ' 

Case P3 implies that clock readings are at most 70 apart, since all nodes invoke 
the pulses within a. 

To analyze case P2 consider that the next pulse event has been invoked at some 
node, p. The following situations may take place: 

P2a) Following its clock adjustment, the clock of p holds the maximal clock value 

among all correct clocks at that moment. 
P2b) Following its clock adjustment, the clock of p holds the minimal clock value 

among all correct clocks at that moment. 
P2c) Neither of the above. 

In case P2a, since p holds the maximal clock value, we claim that no other clock 
reading can read less than ETiastpuise + cvcie^j^ ■ {1 — p). Assume by contradiction 
the existence of a correct node q whose clock reading is less than this value. Further 
assume that node q received the same set of messages from the same sources and at 
the same time as node p. These events caused node p to invoke its pulse and would 
necessarily cause node q to also invoke a pulse. The elapsed time on the clock of node 
q between the current pulse and the previous is thus less than cycle^ijj-{l—p) which is 
less than cycJe^jj, real-time after its previous pulse. A contradiction to the definition 
of cycle^i^. Node p just adjusted its clock which thus reads ET = ETig_stpuise+ Cycle. 
Due to the clock skew the clock difference may increase an additional 2pa until the 
node invokes its pulse and the case reduces to P3. The discussion above implies 
7 = {ETj^stpuise + Cycle) - [ETiastpuise + cycle^^^ ■{l-p)) + 2pa = Cycle - cycle^;^ ■ 
(1 - p) + 2 pa. 

In case P2b, the clock readings of all other nodes that have invoked a pulse can 
not be more than 70 apart (case P3). The clock reading of any node that has not 
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invoked a pulse yet should be less than cycle^^^ following similar reasoning as in 
case P2a. Node p just adjusted its clock which thus reads ET = ETi^stpuise + Cycle. 
Due to the clock skew the clock difference may increase an additional 2pa until the 
node invokes its pulse and the case reduces to P3. The discussion above implies 
7 = (ETi^tpuise + cyde^ax ' (1 + P)) - {ETi^stpuise + Cycle) + 2pa = cycle^^^ ■ (1 + 
p) - Cycle + 2pa. 

For case P2c, if the nodes holding the minimal clock reading and maximal clock 
reading already invoked pulses, then the clock difference reduces to case P3. 

If neither of the nodes holding the minimal and maximal clock values have not 
invoked their pulses yet, then the clock difference reduces to case PI. 

Otherwise, if either the node holding the minimal or the maximal clock value 
already invoked its pulse then one of the bounds of P2a or P2b hold until the other 
node invokes its pulse. 

We now consider the case that a clock wrap around takes place at some AT 
real-time after the last pulse is invoked in the synchronized cycle. Prom the dis- 
cussion earlier we learn that at the moment prior to the first correct clock wraps 
around, the correct clocks are at most 7 apart. Therefore, all correct clocks will 
wrap around within at most another 7 time. During the intermediate time, any two 
correct clocks, for which one has wrapped around and the other not, satisfy 
\clocki{t) — clockj{t)\ > M — 7. Thus we proved that the maximal clock difference 
will remain less than 7 or greater than M — 7, which completes the first Closure 
condition. 

Henceforth, the bound on the clock differences of correct nodes will equal the 
maximal of the three values calculated above. Formally this yields 7 = max[cycle^g^^- 
(1 + p) - Cycle + 2pa, Cycle - cycle^;„ • (1 - p) + 2pa, a{l+p) + cycle^^^ ■ 2p] . The 
explicit value is dependent on the relationship between cycle^^^^, cycle^^^ and Cycle, 
which is determined by the pulse synchronization procedure ((EI)- The explicit value 
of 7 is presented in Section 0] This concludes the first Closure condition. 

For the second Closure condition, note that ^i, as defined in SectionI21 represents 
the actual deviation of an individual correct clock (pi) from the real-time interval 
during which it progresses. This is equivalent to the maximal actual difference be- 
tween the clock value and real-time during a real-time interval in which real-time 
and the clock value were equal at the beginning of the interval. The accuracy of the 
clocks is the bound on the actual deviation of correct clocks from any finite real-time 
interval or rate of deviation from the progression of real-time. Thus it suffices to show 
that correct clocks progress with an accuracy that is a linear function of every finite 
real-time interval to satisfy the second Closure condition. 

The clock progression has an inherent deviation from any real-time interval due to 
the physical clock skew. In addition, the clocks are repeatedly adjusted at every pulse 
in order to tighten the precision, which can further deviate the clocks progression 
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from the progression of the real-time during the interval. In [5^ it is shown that 
the pulses progress with a linear envelope of any real time interval. The accuracy 
in a cycle equals the bound on the clock adjustment \tpuise ~ -ETpujsel) where tpujse 
is the clock value at the pulse at the moment prior to the adjustment of the clock 
to ETp^isg. Under perfect conditions, i.e. no clock skew and zero clock adjustment 
tpuise = ETpuise- This would further equal real-time should the clocks have initiated 
with real-time values. Thus it suffices to show that the adjustment to the clocks at 
every pulse is a linear function of the length of the cycle. The upper and lower bounds 
on the value tp^ise is determined by the bound on the effective cycle length and 
accounts for the clock skew and the accuracy of the pulses (bound on the deviation 
of the pulses from perfect regularity). Let cycJe^j-^ and cycle^^^ denote the lower 
bound and upper bound respectively on the cycle length in real-time units. Hence, 

+ cycle^in ■ (1 - P) < tpuise < ETprev-puise + cycle^^^ ■ iX + P) ■ 
The adjustment to the correct clocks, AD J, is thus bounded by 

E'^pulse [-^'^prev-pulse + cycle^^, ■{l + p)]<0< AD J < 

< ETpuise - [ETprev-pulse + Cycle^j^i^ " (1 ~ Z')] ' 

which translates to 

-I- Cycle — [ETprev-pulse + cycle ■ {1 + p)] < AD, J < 

^'^prev-pulse 

+ Cycle - [ETprev -pulse + cycle^in •(!-/))], 

which translates to 

Cycle - cycie^ax ■ + p) < ^DJ < Cycle - cycle^;„ ■ {1 - p) . 

As can be seen, the bound on the adjustment to the clock is linear in the effective 
cycle length. The bounds on the effective cycle length are guaranteed by the pulse 
synchronization procedure to be linear in the default cycle length. Thus the accu- 
racy of the clocks are within a linear envelope of any real-time interval. The actual 
values of cycle^^^ and cycle^^^.^ are determined by the specific pulse synchronization 
procedure used. This concludes the Closure condition. 

□ 

Thus the algorithm is self-stabilizing and performs correctly with / Byzantine 
nodes for n > 3/ -|- 1. □ 
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3.2 A Clock Synchronization Algorithm without Consensus 

We suggest a simple additional Byzantine self-stabilizing clock synchronization algo- 
rithm using pulse synchronization as a building block that does not use consensus. 

Our second algorithm resets the clock at every pulse^. This approach has the 
advantage that the nodes never need to exchange and synchronize their clock values 
and thus do not need to use consensus. This version is useful for example when M, 
the upper-bound on the clock value, is relatively small. The algorithm has the disad- 
vantage that for a large value of M, a large Cycle value is required. This enhances the 
effect of the clock skew, thus negatively affecting the precision and the accuracy at the 
end of the cycle. Note that the precision and accuracy of Cycle-Wrap-CS equals 
that of PBSS-Clock-Synch. 



Algorithm Cycle-Wrap-CS 




at "pulse" event 


/* received the internal pulse event */ 


begin 




Clock := 0; 




end 





Fig. 2. Additional CS algorithm in which the clock wraps-around every cycle 



3.3 A Clock Synchronization Algorithm using an Approximate 
Agreement Approach 

We suggest an additional self-stabilizing Byzantine clock synchronization algorithm 
using pulse synchronization as a building block, denoted APPROX-CS. 

The algorithm uses an approximate agreement approach in order to get continu- 
ous clocks with high precision and accuracy on expense of the message complexities 
and early-stopping property. The precision and the accuracy are 2a -\-0{^p) and thus 
improve on those of PBSS-Clock-Synch. 

In Line 4 of AppROX-CS the nodes invoke approximate-like agreement on their 
local clock value at the time of the last pulse, denoted Clock-at-pulse. In case that the 
system state was a synchronized clock_state then the resultant value Clockconsensus 
is guaranteed by the Approx_Byz_Agree to be in the range of the initial clock 
values of the correct nodes. If the clocks were not synchronized then the resultant 
agreed value may be in any range. In Line 5 every correct node sets its clock to equal 
the agreed clock value associated with the last pulse, Clockconsensus, incremented 
with the time that has elapsed on its local timer since the pulse. 

^ This approach has been suggested by Shlomi Dolev as well. 
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Algorithm Approx-CS 




at "pulse" event /* received the internal pi 


ilse event */ 


begin 




1. Clock-at-pulae := Clock; 




2. Revoke possible other instances of Approx-CS and 




clear all data structures besides Clock-at-pulse; 




3. Wait until a{l + p) time units have elapsed since pulse; 




4. Clockconsensus := AppROX Byz Agree (Ciocfc-flt-pit/se) ; 




5. Clock := {Clockconsensus + elapsed-time-sincc-pulse) mod M; 




end 





Fig. 3. Self-stabilizing Byzantine Approximate Clock Synchronization algorithm 



Algorithm Approx _ B yz _ Agree (value) 
begin 

1. Invoke Byz _ Agreement () on value; 

2. After termination of all Byz _ Agreement instances (substitute missing values with 0) 
Do: 

3. Find largest set of values within 7+cr of each other (if several, choose set harboring 
smallest value > 0); 

4. Find median of the set, identify its antipode := (median + [M/2J) mod M; 

5. Discard the / immediate values from each side of the antipode; 

6. Return the median of the remaining values; 
end 

Fig. 4. Self-stabilizing Byzantine Approximate Agreement 

In order to be self-contained we bring the definition of Approximate Agreement, 
defined in Pj. 

Formally, the goal of e- Approximate Agreement is to reach the following: let there 
be n processes pi, ...,pn, each starts with an initial value G M and may decide on 
a value G M. 

1. (Approximate Agreement) If pi and pj are correct and have decided then 
\di — dj\ < €. 

2. (Validity) If pi is correct and has decided then there exists two correct nodes 
Pj,Pk such that Vj < di < Vk, (the decision value of every correct node is in the 
range of the initial values of the correct nodes). 
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3. (Termination) All correct nodes eventually decide. 

The approximate agreement protocol in jHj cannot be used as-is in the self- 
stabilization model as the notions of "highest" value and "lowest" value are not de- 
fined when nodes can initialize with values reaching their bounds, M. Faulty nodes 
can in this case cause different correct nodes to view the extremes of the values 
as complete opposites. To overcome the lack of total order relation introduced by 
the self-stabilization model, Approx_Byz_Agree thus combines the approximate 
agreement algorithm of (H] with Byzantine agreement as follows: run separate Byzan- 
tine agreements in parallel on every node's value in order to agree on the value of 
each node. Thus all correct nodes will hold identical multisets and henceforth the 
heuristics of ^ will be executed on exactly the same values at all correct nodes. The 
Approx_Byz_Agree procedure satisfies the conditions for classic approximate 
agreement, while being self-stabilizing. 

The Byz_ Agreement procedure used is the Byzantine agreement of though 
using our BROADCAST primitive presented in Section f A. 21 in order to overcome the 
lack of any common reference to clock time among the correct nodes. 

In Line 1 of Approx_Byz_Agree, every node invokes Byzantine agreement on 
its value, within a real-time of each other. Every instance of Approx_Byz_Agree 
must terminate within some bounded time, thus all correct nodes can calculate a 
time when all the agreement instances have terminated at all correct nodes. In Line 3, 
after all the agreement instances have terminated and missing values are substituted 
with a 0, a set of supposedly synchronized values is searched for. Note that if not 
all instances of Approx_Byz_Agree have terminated within the pre-calculated 
time-bound then the system must have been in a non-coherent state. Synchronized 
clock values can be up-to + a apart in the values agreed subsequent to Line2, due 
to the pulse uncertainty. In Line 4 the median of the set is identified, and will serve 
as an anchor for determining the order relation among the different values. In Line 
5, the antipode (in the range 1..M) of the median is identified; the / first values 
on each side of this antipode are then discarded. If the system is in a synchronized 
clock_state then all values that are outside of the values in the set identified earlier 
are discarded. Thus the median of the remaining values, returned in Line 6, is in the 
range of the initial values of the correct nodes. 

Lemma 1. The Approx_Byz_Agree procedure satisfies all the conditions for e- 
Approximate Agreement, fore = 0, when the system is in a synchronized clock _ state^^ . 

Proof. Note the validity of Byz_Agreement guarantees that the value decided by 
all correct nodes for node i is z's actual input value. 

The notion "in the range of" remains undefined if the system is not in a synchronized clock_state. 
Thus the validity condition remains undefined for this case. 
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1. Approximate Agreement: All correct nodes hold the same multiset of values 
following all terminations of the instances of Byz_Agreement, thus they all 
find the same set in Line 3 and hence do the exact same operations in lines 3-5, 
and thus return the same value in Line 6. 

2. Validity: Let the system be in a synchronized clock_state. Thus the agreed 
clock values for all correct nodes subsequent to executing Line 2 are at most 
7 + 17 apart. Hence, the largest set found in Line 3 includes at least n — / values. 
We now seek to prove that the decision value is in the range of the initial values 
of the correct nodes. Since / < n/3 it follows that all values that are not in the 
range (at most /) of this set are discarded in Line 5. Thus all remaining values 
must be in the range of the initial values of the correct nodes. In particular, the 
median of the remaining values is in the range of the initial values. This completes 
the proof of the validity condition. 

3. Termination: Follows from the termination of Byz_Agreement. 

□ 

The precision 7, is the bound on the clock differences of all correct nodes at any 
time. 

Lemma 2. The precision of Approx_Byz_Agree is 2a + 0{p). 

Proof. At the moment after all correct nodes have executed Line 5 in Approx-CS 

their clocks differ by at most a + 0{p), thus the clock differences are at most a + 
0{p) also at the forthcoming pulse invocation. The precision 7, is maximized at the 
moment that a correct node has set its clock subsequent to its execution of Line 
5 in Approx_Byz_Agree, while some other node has yet to execute this line. 
Following the validity condition, the agreed clock value Clockconsensus, is within 
the initial clock values that was held by the correct nodes at their last pulse. As 
the system is in a synchronized clock_state thus these initial values were within 
2(7 + 0{p) of each other. Thus the node that has just adjusted its clock, set it to a 
value that is within 2a + 0{p) of its clock at the moment before the adjustment. In 
particular this adjusted clock value is also within 2a + 0{p) of the clock value of any 
other correct node. This observation yields a precision of 7 = 2(7 + 0(p). 

□ 

The accuracy equals the maximal clock adjustment which for the same arguments 
as above yields an accuracy oi 2a + 0{p). 

A self-stabilizing Byzantine approximate agreement algorithm that knows how to 
handle bounded, wrapping values and thus does not need to reach exact agreement 
on every node's value, will supposedly yield a clock synchronization algorithm with 
time and message complexity comparable to PBSS-Clock-Synch with precision 
and accuracy of Approx-CS. To the best of our knowledge no such approximate 
agreement algorithm exists. 
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4 Analysis and Comparison to other Clock Synchronization 
Algorithms 

Our clock synchronization algorithm PBSS-Clock-Synch requires reaching con- 
sensus in every cycle. This implies that the cycle should be long enough to allow 
for the consensus procedure to terminate at all correct nodes. This implies having 
cycle^jjj > 2cr + 3(2/ + 4)d, assuming that the Byz_Consensus procedure takes 
(/-|-2) rounds of 3d each. The algorithm has the advantage that it uses the full time 
to reach consensus only following a catastrophic state in which correct nodes hold 
differing ET values. Once in a synchronized clock_state, all correct nodes partici- 
pate in the consensus with the same initial consensus value which thus terminates 
within 2 communication rounds only, due to its early-stopping property. Hence, dur- 
ing steady state, in which the system is in a legal state, the time and message 
complexity overhead of PBSS-Clock-Synch is minimal. 

For simplicity we also assume M to be large enough so that it takes at least a 
cycle for the clocks to wrap around. 

Note that ^i, defined in Section|2l represents the actual deviation of an individual 
correct clock, pi, from a given real-time interval. The accuracy of the clocks is the 
bound on this deviation of correct clocks from any real-time interval. The clocks are 
repeatedly adjusted in order to minimize the accuracy. Following a synchronization 
of the clock values, that is targeted to occur once every Cycle time units, correct 
clocks can be adjusted by at most AD J, where following Theorem Q 

Cycle - cyde^ax ■ + p) < ADJ < Cycle - cycle^;„ ■ {1 - p) , 

which, following cycle^^^ and cycle^^^ determined by the pulse synchronization pro- 
cedure of to equal Cycle — lid and Cycle + 9d respectively, translates to 

-9d(l + p)- p - Cycle < AD J < lld(l - p) + p ■ Cycle . 

The accuracy is thus lid -|- 0{p) real-time units. Should the initial clock values 
reflect real-time then this determines the accuracy of the clocks with respect to real- 
time (and not only with respect to real-time progression rate), as long as the system 
is coherent and clocks do not wrap around. 

Recall that the precision 7, is the bound on the difference between correct clock 
values at any time. This bound is largely determined by the maximal clock value 
difference at the time in which a correct node has just set its clock and some other 
correct node is about to do it in a short time. It is guaranteed by Theorem ^ and 
the pulse synchronization tightness a = 3(i of JS], to be: 
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Table 1. Comparison of clock synchronization algorithms (e is the uncertainty of 



the message delay). The convergence time is in pulses for the algorithms utilizing 
a global pulse system and in rounds for the other semi-synchronous protocols. PT- 
SYNC assumes the use of shared memory and thus the "message complexity" is of 
the "equivalent messages". The '*' denotes the use of a global pulse or global clock 
tick system. 



7 = max [cjde^ax ■ {I + p) - Cycle + 2pa, 

Cycle - cycle^i^ ■ {1 - p) + 2 pa, a{l + p) + cycle^^^ ■ 2p] 
= max[9(i(l + p) + p ■ Cycle + 2pa, lld{l - p) + p ■ Cycle + 2pa, 

3d{l + p) + {Cycle + 9d) ■ 2p] 
= 1W(1 - p) + p- Cycle + 2pa = Ud + 0{p) . 



The bound on the difference between correct clock values immediately after all 
correct nodes have synchronized their clock value (at Line 1 or Line 5) is a. 

The only self-stabilizing Byzantine clock synchronization algorithms, to the best 
of our knowledge, are published in |11|12| . Two randomized self-stabilizing Byzantine 
clock synchronization algorithms are presented, designed for fully connected com- 
munication graphs, use message passing which allow faulty nodes to send differing 
values to different nodes, allow transient and permanent faults during convergence 
and require at least 3/-I-1 processors. The clocks wrap around, where M is the upper 
bound on the clock values held by individual processors. The first algorithm assumes 
a common global pulse system and synchronizes in expected M-2'^^"'~^^ global pulses. 
The second algorithm in does not use a global pulse system and is thus partially 
synchronous similar to our model. The convergence time of the latter algorithm is 
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in expected 0((n — f)n^^"~^^) time. Both algorithms thus have drastically higher 
convergence times than ours. 

In Table 1 we compare the parameters of our protocols to previous classic Byzan- 
tine clock synchronization algorithms, to non-Byzantine self-stabilizing clock syn- 
chronization algorithms and to the prior Byzantine self-stabilizing clock synchroniza- 
tion algorithms. It shows that our algorithm achieves precision, accuracy, message 
complexity and convergence time similar to non-stabilizing algorithms, while being 
self-stabilizing. 

The message complexity of PBSS-Clock-Synch is solely based on the underly- 
ing Pulse and Consensus procedures. Its inherent convergence time is cycle^^^. The 
0{nf'^) message complexity as well as the -1-3(2/ -|- 5)d additive in the convergence 
time come from Byz_C0NSENSUS, the specific Byzantine consensus procedure we 
use. The pulse synchronization procedure we use from has a message complexity 
of O(n^) and 6 • cycle convergence time. Note that Byz_Consensus has two early- 
stopping features: It stops in a number of rounds dependent on the actual number 
of faults and if nodes initiate with the same values (same ET values) then it stops 
within 2 rounds. 

Note that some of the algorithms cited in Table 1 refer to e, the uncertainty in 
message delivery, rather than d, the end-to-end communication network delay. 

The DW-SYNCH and PT-SYNCH algorithms cited in Table 1 make use of global 
clock ticks (common physical timer). Note that this does not make the clock syn- 
chronization problem trivial as such clock ticks can not be used to invoke agreement 
procedures and the nodes still need to agree on the clock values. The benefit of uti- 
lizing a global pulse systems is in the optimal precision and accuracy acquired (see 

m)- 
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A Appendix - The Consensus and Broadcast Primitives 

A.l The Byz Consensus Procedure 

The Byz_C0NSENSUS procedure can implement many of the classical Byzantine 
consensus algorithms. It assumes that timers of correct nodes are always within 
a of each other. More specifically, we assume that nodes have timers that reset 
periodically, say at intervals < cycle'. Let Ti{t) be the reading of the timer at node 
Pi at real time t. We thus assume that there exists a bound such that for every time 
t, when the system is coherent, 

yi,j ifa< Ti{t), Tj{t) < cycle' - a then \Ti{t) - Tj{t)\ < a . 

The bound a includes all drift factors that may occur among the timers of correct 
nodes during that period. When the timers are reset to zero it might be that, for 
a short period of time, the timers may be further apart. The pulse synchronization 
algorithm in [H] satisfies the above assumptions and implies a > d. 

The self-stabilization requirement and the deviation that may arise from any 
synchronization assumption imply that any consensus protocol must be carefully 
specified. The consensus algorithm will function properly if it is invoked when the 
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timers of correct nodes are within a of each other. The subtle point is to make sure 
that an arbitrary initialization of the procedure cannot cause the nodes to block or 
deadlock. Below we show how to update the early stopping Byzantine Agreement 
algorithm of Toueg, Perry and Srikanth (213 to become self-stabilization and to make 
it into a general consensus (vs. agreement) procedure. 

The procedure does not assume any reference to real-time and no complete syn- 
chronization of the rounds, as is assumed in (20]. Rather it resets the local timers 
of correct nodes at each pulse which thus makes the timers within bounds of each 
other. The node invokes the procedure with the value to agree on and the local timer 
value. In the procedure nodes also consider all messages accumulated in their buffers 
that were accepted prior to the invocation, if they are relevant. 

We use the following notations in the description of the consensus procedure: 

- Let d be the duration of time equal to (ex -|- d) • (1 -|- p) time units on a correct 
node's timer. Intuitively, d can be assumed to be a duration of a "phase" on a 
correct node's timer. 

- The Broadcast primitive is the primitive defined in Section IA.2I and is an 
adaptation of the one described in (20]. Note that an accept is issued within the 
Broadcast primitive. 

The main differences from the original protocol of |2n| are: 

- Instead of the General in the original protocol we use a virtual (faulty) "General" 
notion of a virtual node whose value is the assumed value of all correct nodes 
at a correct execution. It is the value with which the individual nodes invoke 
the procedure. Thus, every correct node does a Consensus-broadcast of its 
initial Val in contrast to the original protocol in which only the General does 
this. If all correct nodes initiate with the same value and at the same timer time 
this will be the agreed value. 

- The Consensus-broadcast primitive has been modified by omitting the code 
dealing with the init messages. All correct nodes send an echo of their initial 
values as though they previously received the init message from the virtual Gen- 
eral. 

- It is assumed that the Broadcast and Consensus-broadcast primitives are 
implicitly initiated when a corresponding message arrives. 

Byz_Consensus is presented in a somewhat different style. Each step has a 
condition attached to it, if the condition holds and the timer value assumption holds, 
then the step is to be executed. Notice that only the step needs to take place at a 
specific timer value. 
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Procedure BYZ_CoNSENSUs('l^oi, T) /* invoked at p with timer T */ 

broadcasters := 0; value =_L; 

Do Consensus-broadcast {General, Val,T,l); 

by time (T + 2d) : 

if accepted {General, v,T,l) then 
value := v; 

by time {T + {2f + 4)d) : 
if value y^_L then 

Broadcast {p, value, T, [ ^^d^ J + -'^)' 
stop and return value. 

at time (T + 2rd) : 

if (I broadcasters I < r — 1) then 
stop and return value. 

by time (T -|- 2rJ) : 

if accepted {General, v' ,T,1) and r — 1 distinct messages {qi,v' ,T,i) 
where 2 <i < r, and ^ then 

value := v'; 

Fig. 5. The Byz_Consensus procedure 



The Byz_Consensus procedure satisfies the following typical properties: 

Termination: The protocol terminates in a finite time; 
Agreement: The protocol returns the same value at all correct nodes; 

Validity: If all correct nodes invoke the protocol with the same value and time, then the 
protocol returns that value; 

It also satisfies the following early stopping properties: 

If all correct nodes invoke the protocol with the same consensus value and with 
the same timer value, then they all stop within two "rounds" of information 

exchange among correct nodes. 

If the actual number of faults is /' < / then the algorithm terminates by min[T-|- 
(2/' + 6)J, T + (2/ + 4)(J] on the timer of each correct node. 

Notice that [ES-1] takes in practice significantly less time than the specified upper 
bound on the message delivery time. 

We first prove the properties of the Consensus-broadcast primitive and later 
we prove the correctness of the Byz_Consensijs procedure. 

The Consensus-broadcast primitive and the Broadcast primitive (defined in 



ES-1 
ES-2 
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Procedure Consensus-broadcast {General, v,r, I) 

/* invoking a broadcast simulating the General */ 
/* nodes send specific message with the same r only once */ 
/* multiple messages sent by an individual node are ignored*/ 

send [echo, General, v, t, 1) to all; 

by time (r + d) : 

if received (echo, General, v,t, 1) from > n ~ 2f distinct nodes then 

broadcasters := broadcasters[J{General} ; 
if received (echo. General, v, t, 1) from > n — f distinct nodes q then 

send (echo' , General, v,T, 1) to all; 

at any time: 

if received (echo' , General, v,t, 1) from > n — 2 f distinct nodes then 

send (echo' , General, v,T,l) to all; 
if received (echo' , General, v,t, 1) from >n~f distinct nodes then 

accept (General, v,T,l); 

Fig. 6. Consensus-broadcast 

Section lA^ satisfy the following [TPS-*] properties of Toueg, Perry and Srikanth |2nj . 
which are phrased in our system model. 

TPS-1 ( Correctness) If a correct node p does BROADCAST {p, m, r, /c) by r + {2k — 2)d 
on its timer, then every correct node accepts {p, m, r, A:) by r + 2kd on its timer. 

TPS-2 (Unforgeability) If no correct node p does a BROADCAST {p,m,T, k), then no 
correct node accepts {p,m,T, k). 

TPS-3 (Relay) If a correct node accepts {p,m,T,k) by r + 2rd, for r > A;, on its timer 
then every other correct node accepts {p, m, r, fe) by r + (2r + 2)d on its timer. 

TPS-4 (Detection of broadcasters) If a correct node accepts {p,m,T,k) by r + 2rd, on 
its timer then every correct node has p G broadcasters by r + (2k + l)d on its 
timer. Furthermore, if a correct node p does not Broadcast any message, then 
a correct node can never have p S broadcasters. 

Additionally, the Consensus-broadcast primitive also satisfies: 

TPS-5 (Uniqueness) If a correct node accepts {General, m,T,l), then no correct node 
ever accepts {General, m' ,t, 1) with m' ^ m. 

Notice the differences from the original properties. The detection property does 
not require having r > /c. In general, the relay property holds even earlier than r > k. 
The condition r > fc of when the property can be guaranteed is used to simplify the 
possible cases. At r < k, if an accept takes place as a result of getting n — f echo 
messages, the adversary may cause the relay to take 3d by rushing messages to one 
correct node and delay messages to and from others. 
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Theorem 2. T/ie Consensus-broadcast primitive satisfies the five [TPS-*] prop- 
erties. 

Proof. 

Correctness: If all correct nodes send {echo, General, v, t, 1) at time r on their timers, 
then by Lemma IHl every correct node accepts {General, v,t,1) from n — f correct 
nodes by r + d on its timer. Thus each correct node sends {echo. General, v, r, 1) by 
that time and will accept {General, v, t, 1) by r + 2d on their timers. 

Unforgeability: If all correct nodes hold the same initial value v then no correct 
node will send {echo. General, v' , 1), thus no correct node will receive n — f distinct 
{echo. General, v' , 1) messages. Therefore, no correct node will send {echo' , General, v' , 
and no correct node will ever receive n — 2/ or n — / distinct {echo' , General, v' , 1) 
messages. Thus, no correct node can accept {General,v' ,1). 

Relay: If a correct node accepts {General, v,t, 1) by t + 2rd on its timer, then it 
received n — f distinct {echo' , General, v, r, 1) message by that time, n — 2/ of these 
were sent by correct nodes and by Lemma [HI all of them will reach all correct nodes 
by T + (2r + l)d. As a result, all such correct nodes will send {echo' , General, v, r, 1), 
which will be received by all correct nodes. Hence, by r + (2r + 2)d on their timers, 
all correct nodes will hold n — f distinct {echo' , General, v,t, 1) messages and will 
thus accept {General, v,t,1). 

Detection of broadcasters: If a correct node q' accepts {General, v,t,1) by time 
T + 2rd on its timer, then node q' should have received at least n — f distinct 
{echo' , General, v,T,l) messages, at least n — 2/ of which are from correct nodes. 
Let q be the first correct node to ever send {echo' , General, v,t, 1). If q sent it as a 
result of receiving n — f such messages, then q is not the first to send. Therefore, it 
should have sent it as a result of receiving n — / {echo. General, v, r, 1) messages by 
time T + d. Thus, at least n — 2f such messages were sent by correct nodes by time 
T on their timers and would arrive at all correct nodes by time r + d on their timers. 
As a result, all will have General E broadcasters. 

Uniqueness: Notice that if a correct node sends {echo' , General, v, r, 1) by time T + d, 
then no correct node sends {echo' , General, v' ,1) at any later time. Otherwise, sim- 
ilarly to the arguments in proving the previous property we get that at least n — f 
nodes sent {echo. General, v, r, 1) and n — f nodes sent {echo. General, v' , 1). Since 
n > 3/, this implies that at least one correct node sent both {echo. General, v,t,1) 
and {echo,General,v' ,1), and this is not allowed. 
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Also note that if a correct node accepts {General, v,t, 1), then at least one cor- 
rect node sends (echo' , General, v,t,1), which yields the proof of the Uniqueness 
property. □ 

Nodes stop participating in Byz_Consensus when they are instructed to do 
so. They stop participating in the Broadcast primitive 2d after they terminate 
Byz_Consensus. 

Definition 7. 

A node returned a value m if it has stopped and returned value = m. 
A node p decides if it stops at that timer time and returns a value ^J- . 
A node p aborts if it stops and returns _L . 

Theorem 3. The Byz_Consensus procedure satisfies the Termination property. 
When n > 3/, it also satisfies Agreement, Validity and the two early stopping condi- 
tions. 

Proof. We prove the five properties of the theorem. We build up the proof through 
the following arguments. 

Lemma 3. // a correct node aborts at time T + 2rd on its timer, then no correct 
node decides at a time T + 2r'd >T + 2rd on its timer. 

Proof. Let p be a correct node that aborts at time T+2rd. In this case it should have 
identified exactly r — 2 broadcasters by that time. By the detection of broadcasters 
property [TPS-4] no correct node will ever accept {General, v, T, 1) and r — 2 distinct 
messages {qi,v, T, i) for 2 < i < r — 1, since that would have caused all correct nodes 
to hold r — 1 broadcasters by time T + (2r — l)d on their timers. Thus, no correct 
node can decide at local-time T + 2r'd > T + 2rd. □ 

Lemma 4. If a correct node decides by time T + 2rd on its timer, then every correct 
node decides by time T + 2(r + l)d on its timer. 

Proof. Let p be a correct node that decides by time T+2rd on its timer. We consider 
the following cases: 

1. r = 1 : No correct node can abort by time T + 2d, since the inequality will 
not hold. Node p must have accepted (General, v,T, 1) by T + 2d. By the relay 
property [TPS-3] all correct nodes will accept (General, v, T, 1) by T+Ad on their 
timers. Moreover, p invokes Broadcast (p,v,T,2), by which the correctness 
property [TPS-1] will be accepted by all correct nodes by time T + Ad on their 
timers. Thus, all correct nodes will have value j^-L and will Broadcast and 
stop by time T + Ad on their timers. 
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2. 2 < r < / + 1. Node p must have accepted {General, v,T, 1) and also accepted 
r — 1 distinct {qi, v, T, i) messages for alH, 2 < i < r, by time T + 2rd on its timer. 
By Lemma IHl no correct aborts by that time. By Relay property [TPS-3] each 
{qi,v,T,i) message will be accepted by all correct nodes by time T + (2r + 2)d 
on their timers. Node p does Broadcast {p, v,T,r + 1) before stopping. By the 
correctness property, this message will be accepted by all correct nodes by time 
T + (2r + 2)d on their timers. Thus, no correct node will abort by T + (2r + 2)d 
and all correct nodes will have value /_L and will decide and stop by that time. 

3. r = f + 2. Nodep must have accepted {qi,v,T,i) messages for alH, 2 < i < / -|-2, 
by T + (2/ + 4)(i on its timer, where the / + 1 qi's are distinct. At least one of 
these / + ! nodes, say qj, must be correct. By the Unforgeability property [TPS-2] 
qj, invoked BROADCAST {qj,v,T,j) by time T + {2j)d on its timer, and decided. 
Since j < / + 1 the above arguments imply that by T + (2/ + 4)(i on their timers 
all correct will decide. 

□ 



Lemma implies that if a correct node decides at time T + 2rd on its timer, then 
no correct node aborts at round T + 2r'd. Lemma 13 implies the other direction. 

Termination: Lemma^implies that if any correct node decides, all decide and stop. 
Assume that no correct node decides. In this case, no correct node ever invokes a 
Broadcast {q,v,T, _). By detection of broadcasters property [TPS-4], no correct 
node will ever be considered as broadcaster. Therefore, by time T + ((2/ + A)d on 
their timers, all correct nodes will have at most / broadcasters and will abort and 
stop. □ 



Agreement: If no correct node decides, then all abort, and return to the same value. 
Otherwise, let p be the first correct node to decide. Therefore, no correct node aborts. 
The value returned by p is the value v of the accepted {General, v, 1) message. By 
Properties [TPS-3] and [TPS-5] all correct nodes accept {General, v,T,l) and no 
correct node accepts {General, v' ,T, 1) for v ^ v' . Thus all correct nodes return the 
same value. □ 



Validity: Let all the correct nodes begin with the same value v' and invoke the 
protocol with the same timer time (T). Then, by time T -|- d on their timers, all 
correct nodes receive at least n — 2f distinct {echo. General, v' ,T,1) messages via 
the Consensus-broadcast primitive and send {echo' , General, v' ,T,1) messages 
to all. Hence, all nodes receive at least n — f distinct {echo' , General, v' ,T,1) mes- 
sages by T -|- 2d on their timers and thus accept {General, v' ,T,1). Hence in the 
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Byz_Consensus procedure all correct nodes set their value to v' . By T + 2d on 
their timers, all correct nodes will stop and return v' . □ 

Early-stopping: The first early stopping property [ES-1] is directly implied from 
the proof of the validity property. Correct nodes proceed once they receive messages 
from n — f nodes, thus it is enough to receive messages from all correct nodes. The 
proof of the second early stopping property [ES-2] is identical to the proof of the 
termination property. By time T + (2/' + 4)(i all will abort unless any correct node 
invokes Broadcast by that time on its timer. This implies that by T + (2/' + 6)d 
on their timers all correct nodes will always terminate, if the actual number of faults 
/' is less than /. □ 

Thus the proof of the theorem is concluded. □ 
A. 2 The Broadcast Primitive 

This section presents the Broadcast (and accept) primitive that is used by the 
Byz_Consensus procedure presented earlier, in Section lA.il The primitive follows 
the primitive of of Toueg, Perry, and Srikanth [201, though here it is presented in a 
real-time model. 

In the original synchronous model, nodes advance according to phases. This in- 
tuitive lock-step process clarifies the presentation and simplifies the proofs. In this 
section, the discussion carefully considers the various time consideration and proves 
that nodes can rush through the protocol and do not to need to wait for a completion 
of a "phase" in order to move to the next step of the protocol. 

Note that when a node invokes the procedure it evaluates all the messages in its 
buffer that are relevant to the procedure. 

The Broadcast primitive satisfies the four [TPS-*] properties, under the as- 
sumption that n > 3/. The proofs below follow closely to the original proofs of |2nj . 
in order to make it easier for readers that are familiar with the original proofs. 

Lemma 5. // a correct node pi sends a message at timer time Ti < t + rd on pi 's 
timer it will he recieved by each correct node pj by timer time t + {r + l)d on pj 's 
timer. 

Proof. Assume that node pi sends a message at real time t with timer time Ti{t) < 
T + rd. Thus, Tj(i) < t + r{a + d){l + p). It should arrive at every correct timer pj 
within d{l + p) on any correct node's timer. Recall that |Tj(t) — Tj{t)\ < a{l + p). 
If Tj > Ti we are done. Otherwise, 

Tj{t) < Ti{t) + + p) <T + r{a + d){l + p) + a{l + p) . 
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Procedure Broadcast {p, m, t, k) 

/* executed per such quadruple 
/* nodes send specific message with the same t only once 
/* multiple messages sent by an individual node are ignored 

node p sends {init,p, rn, r, k) to all nodes; 

by time (r + {2k - 1) J) : 

if (received {init,p,m,T,k) from p then 
send {echo,p,m,T,k) to all; 

by time (r + 2kd) : 

if (received {echo,p, m, r, k) from > n — 2f distinct nodes q then 

send {init' ,p,m,T,k) to all; 
if (received {echo,p,m,r,k) msgs from > n — f distinct nodes then 

accept (p, m,T,k); 

by time (r + {2k + l)d) : 

if (received {init' ,p,m,T,k) from >n — 2f then 

broadcasters := broadcasters[j{p}; 
if (received {init' ,p,m,T,k) from >n — f distinct nodes then 

send {echo' ,p,m,T,k) to all; 

at any time: 

if (received {echo' ,p,m,T,k) from >n — 2f distinct nodes then 

send {echo' ,p,m,T,k) to all; 
if (received {echo' ,p,m,T,k) from >n — f distinct nodes) then 

accept (p, m, t, k); 

end 



Fig. 7. Broadcast primitive 

By the time (say t') that the message arrives to pj we get 

Tj{t') < T + r{a + d){l + p) + a{l + p) + d{l + p) <T + (r + l)J. 



□ 



Lemma 6. // a correct node ever sends {echo' ,p,m,T,k) then at least one correct 
node must have sent {echo' ,p,m,T, k) by timer time r + (2A; + l)d. 

Proof. Let t be the earhest timer time by which any correct node q sends the message 
{echo',p, m,T,k). If t > T + {2k + l)d, node q should have received {echo',p, m, r, k) 
from n — 2/ distinct nodes, at least one of which from a correct node that was sent 
prior to timer time r + {2k + l)d. □ 

Lemma 7. // a correct node ever sends {echo' ,p,m,T, k) then p's {init,p,m,T,k) 
must have been received by at least one correct node by time r + {2k — l)d. 
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Proof. By Lemma|ni if a correct node ever sends {echo' ,p, m, r, A:), then some correct 
node q should send it by time timer t + {2k + l)d. By the procedure, q have received 
{init' ,p, m, r, k) from at least n — f nodes by timer time r + {2k + l)d. At least one 
of them is correct who have received n — 2f {echo,p, m, r, k) by timer time r + 2kd. 
One of which was sent by correct node that should have received {init,p,m,T,k) 
before sending {echo,p, m, r, k) by timer time r + {2k — l)d. □ 

Theorem 4. The Broadcast primitive presented in Figure Q satisfies properties 
[ TPS- 1 ] through [ TPS-4 ]. 

Proof. 

Correctness: Assume that a correct node p sends {p, m,T,k) by r + {2k — 2)d on 
its timer. Every correct node receives {init,p,m,T, k) and sends {echo,p,m,T, k) by 
T + {2k — l)d on its timer. Thus, every correct node receives n — f {echo,p, m, r, k) 
from distinct nodes by r + {2k — l)d on its timer and accepts {p, m, r, k). 

Unforgeability: If no correct node p does a Broadcast {p, m, r, k), it does not 
send {init, p,m,T,k), and no correct node will send {echo, p,m,T,k) by r + {2k — l)d 
on its timer. Thus, no correct node accepts {p, m, r, /c) by r + 2kd on its timer. If a 
correct node would have accepted {p, m, r, k) at a later time it can be only as a result 
of receiving n — f {echo' ,p,m,T, k) distinct messages, some of which must be from 
correct nodes. By Lemmad p should have sent {init,p,m,T, k), a contradiction. 

Relay: Notice that r > k, thus even if nodes issue an accept at earlier time, the 
claim holds for the specified times. 

The subtle point is when a correct node issues an accept as a result of getting 
echo messages. If r = k and the correct node, say q, have received {echo,p,m,T, k) 
from n — f nodes by r + 2kd on its timer. At least n — 2/ of them were sent by correct 
nodes. Since every correct node among these has sent its message by r + {2k — l)d, 
all those messages should have arrived to every correct node by T + 2kd on its timer. 
Thus, every correct node should have sent {init' ,p, m, r, fc) by r + 2kd on its timer. 
As a result, every correct node will receive n — f such messages by r + {2k + l)d 
on its timer and will send {echo' ,p,m,T, k) by that time, which will lead all correct 
nodes to accept {p, m, r, fe) by r + (2r + 2)d on its timer. 

Otherwise, the correct node, say q, accepts {p, m, r, A;) by r + 2rd on its timer as 
a result of receiving n — f {echo' ,p,m,T, k) by that time. Since n — / of these are 
from correct nodes, they should arrive at any correct node by r + (2r + l)d on their 
timers. As a result, by r + (2r + l)d, all correct nodes would send {echo' ,p,m,T, k) 
and by r + (2r + 2)d on their timers all will accept {p, m, r, k). 

Detection of broadcasters: As in the original proof, we first argue the second 
part. Assume that a correct node q adds node p to broadcasters. It should have 
received n — 2/ {init' ,p,m,T, k) messages. Thus, at least one correct node has sent 
{init' ,p, m, r, k) as a result of receiving n—2f {echo,p, m, r, k) messages. One of these 
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should be from a correct node that has received the original Broadcast message 
of p. 

To prove the first part, we consider two similar cases to support the Relay prop- 
erty. If r = k and the correct node, say q, accepts {p, m, r, k) as a result of receiv- 
ing n — f {echo,p,m,T,k) by r -|- 2kd on its timer. At least n — 2/ of them were 
sent by correct nodes. Since every correct node among these has sent its message 
by r + {2k — l)d, all those messages should have arrived at every correct node by 
T + 2kd on its timer. Thus, every correct node should have sent {init' ,p, m, r, k) by 
T + 2kd on its timer. Consequently, all correct nodes will receive n — / such messages 
by time r -|- {2k + l)d and will add p to broadcasters. 

Otherwise, q accepts {p,m,T,k) as a result of receiving {echo' ,p,m,T, k) from 
n — f nodes by r -|- 2rd (for r > A;) on its timer. By Lemma El a correct node sent 
{echo' ,p,m,T,k) by r -|- {2k + l)d. It should have received n — f {init' ,p,m,T, k) 
messages by that time. All such messages that were sent by correct nodes were sent 
by T + 2kd on their timers and should arrive at every correct node by T + {2k + l)d on 
its timer. Since there are at least n — 2f such messages, all will add p to broadcasters 
by r -|- {2k + l)d on their timers. □ 



